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-  -  'We  present  necessary  and  sufficient  conditions  for  blind  equalization/deconvolution  (without 
observing  the  input)  of  an  unknown,  possible  non-minimum  phase  linear  time  invariant  system 
(channel).  Based  on  that,  we  propose  a  family  of  optimization  criteria  and  prove  that  their  solution 
correspond  to  the  desired  response.  These  criteria,  and  the  associated  gradient-search  algorithms, 
involve  the  computation  of  high  order  cumulants.  The  proposed  criteria  are  universal  in  the  sense 
that  they  do  not  impose  any  restrictions  on  the  probability  distribution  of  the  input  symbols.  We 
also  address  the  problem  of  additive  noise  in  the  system  and  show  that  in  several  important  cases, 


e.g.  when  the  additive  noise  is  Gaussian,  the  proposed  criteria  are  unaffected. 
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I.  Introduction 

Inverse  filtering,  or  blind  equalization/deconvolution,  is  a  problem  of  considerable  practical 
interest  in  diverse  fields  including  seismology,  radio  astronomy,  underwater  acoustic  telemetry, 
and  data  communication.  The  problem  is  illustrated  in  Figure  1.  We  observe  the  output  yn  of 
an  unknown  possibly  non-minimum  phase  linear  time-invariant  system  H  with  input  an  being 
a  sequence  of  independent  identically  distributed  (i.i.d.)  random  variables  with  a  prespecified 
probability  distribution.  We  want  to  recover  the  input  sequence,  or  equivalently,  to  identify  the 
inverse  H -1  of  H  using  a  tap-delay  line  C.  This,  in  turn,  requires  the  identification  of  both 
the  magnitude  and  the  phase  of  the  unknown  system’s  transfer  function.  The  magnitude  can  be 
identified  using  second  order  moments  of  the  output  signal.  However,  phase  identification  requires 
the  calculation  of  higher  order  moments/cumulants. 

The  paper  by  Sato  [1]  and  Godard  [2]  approach  the  problem  of  blind  equalization  by  introducing 
new  criteria,  different  from  the  mean  square  error  (m.s.e.)  criterion  used  for  trained  equalizers,  and 
then  apply  gradient-search  algorithms  to  optimize  the  selected  criteria.  Sato’s  method  and  Godard’s 
method  are  further  analyzed  by  Benveniste  and  Goursat  [3]  and  by  Foschini  [4],  respectively,  and 
the  conditions  necessary  to  ensure  the  convergence  of  the  respective  algorithms  axe  specified. 

The  paper  by  Benveniste,  Goursat,  and  Ruget  [5]  presents  several  concepts  and  results  that 
significantly  contributed  to  the  understanding  of  the  problem.  First,  it  has  been  established  that 
a  criterion  based  on  second  order  statistics,  e.g.  the  m.s.e.  criterion,  is  insufficient  for  phase 
identification.  For  that  reason,  the  problem  cannot  be  solved  when  the  probability  distribution 
of  the  input  symbols  is  Gaussian  since  the  second  order  moments  completely  specify  the  input- 
output  statistics.  Next,  it  has  been  proven  that  a  sufficient  condition  for  equalization  is  that  the 
probability  distribution  ol  the  output  (recovered)  symbols  zx  be  equal  the  probability  distribution 
of  the  input  symbols  a,-.  This  principle  is  then  used  to  formulate  a  general  class  of  criteria  that 
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converge  to  the  desired  system  response  under  the  assumption  that  the  probability  distribution 
of  the  input  symbols  belong  to  a  certain  class  of  continuous-type  distributions.  This  is,  however, 
a  rather  restrictive  condition;  for  example,  in  digital  communication®  the  input  distribution  is 
inevitably  of  discrete  type. 

The  paper  by  Shalvi  and  Weinstein  [6]  proves  that  it  is  sufficient  to  equalize  the  second  and 
fourth  order  cumulants  of  the  input  and  the  output  probability  distributions.  Based  on  that,  new 
criteria  are  presented,  that  require  only  partial  knowledge  of  the  input  distribution.  Godard’s 
criterion  is  shown  to  be  a  special  case  of  these  criteria.  An  important  feature  of  the  proposed 
criteria  is  that  their  maximization  correspond  to  the  desired  response,  and  they  do  not  suffer 
from  unwanted  local  maxima.  Therefore,  the  associated  gradient-search  algorithm  is  expected  to 
converge  to  the  desired  response  regardless  of  initialization. 

In  this  paper  we  extend  the  results  of  [6].  First,  we  present  necessary  and  sufficient  conditions 
for  equalization  based  on  high  order  cumulants  of  the  corresponding  probability  distributions.  We 
then  propose  a  general  class  of  criteria,  prove  that  their  optimization  must  yield  the  desired  solution, 
and  present  the  associated  gradient-based  algorithms.  These  criteria  are  universal  in  the  sense  that 
they  do  not  impose  any  restrictions  on  the  probability  distribution  of  the  input  symbols.  We  briefly 
address  the  problem  of  additive  noise  in  the  system  and  show  that  in  some  important  cases,  e.g. 
when  the  additive  noise  is  Gaussian,  the  proposed  criteria  are  unaffected. 
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II.  Mathematical  Preliminaries 

Let  xi ,  *2, . . .  z„  be  a  set  of  real/complex  random  variables  with  the  joint  characteristic  function 

$(u>i,  tu2,  • .  .wn)  =  E  {e*  w,x'  | 

where  j  =  a/— 1.  and  E{-}  stand  for  the  expectation  operation. 

The  joint  cumulants  are  defined  by: 

cum(zi  :  p:;z2  :  p2; •  •  •  \xn  :  pn)  = 

_  /  ap  ffl,/n$(tPi,tP2,...tt>w) 

where  p,  are  non-negative  integers,  and  p  =  p,. 

For  notational  convenience,  if  pt  =  1  we  do  not  write  it  in,  that  is: 

cum(. . . ;  Xi  :  1; . . .)  =  cum(. . . ;  z,-; . . .) 

The  following  properties  can  be  verified: 

(P-1) 
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(p.4)  If  x\,x2,.  ..in  are  jointly  Gaussian,  then: 

cum(ii  :  pi\X2  :p2\...;xn  :  pn )  =  0 

whenever  p  =  £.n=i  Pi  >  2. 
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III.  Development  of  the  Criteria 


The  basic  system  of  interest  is  illustrated  in  Figure  1.  We  shall  make  the  following  assumptions: 

(i)  The  input  sequence  a,  consists  of  real/complex  i.i.d.  random  variables  with  a  non-Gaussian 
but  otherwise  completely  arbitrary  continuous/discrete  probability  distribution.  We  shall 
assume  the  existence  of  certain  cumulants  of  a,-. 


(ii)  The  unknown  system  (channel)  H  =  {h,}  is  a  possibly  non-minimum  phase  linear  time 
invariant  filter  whose  transfer  function  has  no  zeros  on  the  unit  circle,  that  is: 


H{u)  =  hieju,i  #0  0  <  u  <  2tt 


(1) 


(iii)  The  equalizer  C  =  {c,}  is  a  tap-delay  line  of  sufficient  length  so  that  truncation  effects  can 
be  ignored. 

Let  S  =  {s,}  denote  the  combined  channel-equalizer  response,  that  is  the  convolution  of  H  with 
C  (see  Figure  1).  Thus 


Si 


t  —  o  c,'  —  ^  ]  ci hi -i 


and 


Zi  —  ctj  o  Si  —  )  ^  Sf  Q>i—l 


(2) 


(3) 


Invoking  properties  (p.l)  and  (p.2) 


cum(z,  :  p;  z*  :  q )  =  cum  (z,-;  z,-; . . . 

zi}z* ;  z*; . . .  z* 

p  time* 

/ 

g  time* 

=  cum  Y^shai-h',YsSl*ai-h'T  • 

\  h  h 

b 

\ 

^  ^  ^i—ki »  ^  ^kj  Q'x—ki  >  • 

-;E 

ki  fej 

*«  ) 

-EE-EEE  •  •  •  £  *1,  sh  •  •  •  slpsmkl  si  cum  (a,.,, ;  a,_,2 ; . . . 
h  h  b  fci  h  k. 


Invoking  properties  (p.3)  and  (p.l) 


cum  •  •  • !  ai-ki  >  at-Jc2i  •  •  • !  — 

' 

cum(oi  :p;a’  :  q)  [x  =  l2  =  . . .  =  lp  =  kx  -  k2  =  . . .  =  kq 
0  otherwise 

Substituting  (5)  into  (4) 


(5) 


cum(zi  :  p ;  z*  :  q )  =  -sf^F  j  cum(a,-  :  p;  a*  :  9)  (6) 

This  equation,  relating  the  cumulants  of  the  output  symbols  z,  to  the  cumulants  of  the  input 
symbols  a,,  form  the  basis  to  the  subsequent  development.  We  shall  distinguish  between  the 
complex  case  in  which  at  and/or  s,  are  complex  valued  sequences  and  the  real  case  in  which  both 
sequences  are  real  valued. 

Theorem  1.  (complex  case) 

Let  a,  be  a  sequence  of  real/complex  i.i.d.  random  variables  with  some  given  probability 
distribution.  Suppose  that  cum(a,-;a,*)  >  0  and  cum  (a,  :  p;a“  :  q)  ^  0,  where  p  and  q  are  some 
non-negative  integers  such  that  p  +  q  >  2,  exist.  Let  be  a  deterministic  (possibly  infinite) 
real/ (..implex  valued  sequence.  Let  the  random  variables  z,  be  specified  by  (3). 

If 


cum(z,;z*)  =  cum(o,;a,*) 


(7) 


Then: 


jcum(z,-  :  p;  z-  :  ?)|  <  |cum(a<  :  p;  a*  :  ?)| 


(8) 


where  equality  holds  if  and  only  if 


si 


=  = 


pi* 


l  =  k 
l  ^  k 


(9) 
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for  some  fixed  integer  k  and  4>tR} . 
Proof:  Invoking  (6)  for  p  =  q  =  1,  if 


Then 


Hence,  for  p  +  q  >  2, 


Thus,  invoking  (6), 


V-  I.  12  _  cum(z,;z*)  , 

Vs'1  -  c^(a~aT)  -  1 


M’  <  1 


\S,\  <  1 


Np+9  <  N2 


\cum(zx  :  p;  z'  :  g)|  _  ^  p 

|cum(oi  :p;a*  :  g)|  "  1  1 

<Enp+,^En2  =  1 
(  / 

where  equality  holds  if  and  only  if  s;  satisfies  (9)  for  some  fixed  integer  k  and  4>t7ll .  Q.E.D. 

Note  that  the  condition  cum(a,;a*)  =  £{|a,  -  £{ai}|2}  >  0  implies  that  a,  is  a  non-trivial 
random  variable  with  a  non-zero  variance.  The  assumption  cum(a,  :  p;a*  :  q)  ^  0  where  p  +  q  >  2 
implies  that  the  probability  distribution  of  the  input  symbols  is  non-Gaussian  (see  property  (p.4)). 
Corollary  1.1 

Under  the  assumptions  of  theorem  1.1, 


zx  = 


(w.p.l.) 


for  some  fixed  integer  k  and  fall}  if  and  only  if 


aim(zj;z‘)  =  cum(a,;a’) 
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and 


|cum(z,-  :  p;  z-  :  ?)|  =  |cum(a,-  :  p;  a *  :  ?)|,  (12) 

for  any  non-negative  integers  p  and  q  such  that  p  +  q  >  2. 

Proof:  By  theorem  1,  if  (11)  and  (12)  are  satisfied  then 

*i  =  k  (13) 

for  some  fixed  integer  k  and  fall1.  Substituting  (13)  into  (3)  immediately  yields  (10). 

On  the  other  hand,  if  (10)  is  satisfied,  then  by  properties  (p.l)  and  (p.2),  for  any  non-negative 
integers  p  and  e, 


cum(zt-  :  p\  z*  :  q)  =  cum  (z,-;  z,; . . . ;  2,;  z*;  z*; . . .  z*) 


p  terms 


9  terms 


=  cum  ;  e-^a^k ; 

- - - - V - ' 

p  terms 

q  terms 

=  gi(p  ?)^cum  (o,_fc|  <*t— fci  •  •  •  ?  ai— k  i  ai— Jfc)  ai— Jti  •  •  ■  i  a«  —  k) 

" - - - -  - - - - ' 

P  times  q  times 

=  e’<p-?^cum(o,;a,;...:a,-;  a*; a- ;...;<) 

> . .  — - -  ' - * - ' 

p  terms  q  terms 

=  e^p-,^cum  (a,-  :  p;  a*  :  q) 


(14) 


which  implies  (11)  and  (12).  Q.E.D. 

Corollary  1.1  asserts  that  the  output  (recovered)  sequence  is  identical  to  the  input  sequence  up 
to  a  constant  delay  and  possibly  a  constant  phase  shift  if  and  only  if  the  variance  of  the  individual 
z,  and  the  magnitude  of  any  of  the  non-zero  cumulant  of  order  p  +  q  >  2  are  equal  to  that  of  a,, 
the  constant  delay  is  unavoidable  because  of  the  stationarity  of  the  input  sequence.  However,  the 
phase  shift  can  be  identified  in  some  cases,  as  indicated  below. 
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Corollary  1.2 

Under  the  assumptions  of  theorem  1, 

Zi  =  e^cn-k  w.p.l  (15) 

where  k  is  some  fixed  integer  and 

(p  -  q)<f>  =  2nN,  N  -  integer  (16) 

if  and  only  if 

cum(zi;z*)  =  cum(a.j;a*)  (17) 

and 

cum  (z,  :  p;  z*  :  q )  =  cum  (a,  :  p;  a‘  :  q)  (18) 

for  some  non-negative  integers  p  and  q  such  that  p  +  q  >  2. 

Proof:  If  z,  and  a,  are  related  by  (15),  then  the  corresponding  cumulants  axe  related  by  (14), 
which  immediately  imply  (17)  and  (18). 

On  the  other  hand,  if  (17)  is  satisfied,  then  by  theorem  1 

|cum  (zi  :  p;  z*  :q)\  =  |cum  (o,-  :  p;  a*  :  q)  | 

if  and  only  if 

Ij*  l  —  k 

(19) 

0  Ijtk 

for  some  fixed  integer  k  and  4>(.V} ,  in  which  case 

cum  (z,  '•  p;  z*  :  q)  =  ^  p  ,  = 
cum(a,-  :  p;a*  :  9)  \  1  ' 

which  is  equal  to  1,  if  and  only  if  (16)  is  satisfied.  Substituting  (19)  into  (3)  immediately  yields 
(15).  Q.E.D. 
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Corollary  1.2  asserts  that  if  cum(a,  :  p;a*  :  q)  ^  0  where  p  ^  q,  then  the  indicated  phase  shift 
can  be  identified  up  to  an  ambiguity  of  order  \p  -  q\.  Thus,  if  \p  -  q\  =  1,  then  the  phase  shift 
between  the  input  and  recovered  symbols  can  be  completely  eliminated.  However,  we  note  that  in 
most  signal  constellations  used  for  data  communications,  the  input  distribution  is  symmetric  under 
rotation,  in  which  case  all  cumulants  of  order  p  q  are  equal  to  zero. 

Theorem  2  (real-case) 

Let  a,  be  a  sequence  of  real  i.i.d.  random  variables  with  some  given  probability  distribution. 
Suppose  that  cum(ai  :  2)  >  0  and  cum(a{  :  p)  ^  0,  where  p  >  2,  exist.  Let  s,  be  a  deterministic 
(possibly  infinite)  real- valued  sequence.  Let  the  random  variables  z,  be  specified  by  (3). 

If 

cum(z;  :  2)  =  cum(a,  :  2)  (20) 


Then: 


where  equality  holds  if  and  only  if 


cum(z,  ;  p)  < 
cum(a,  :  p)  “ 


Si  =  p<5i_* 


where  k  is  some  fixed  integer  and 

1  p  odd 

±1  p  even 

Proof:  Invoking  (6)  for  p  =  2  and  q  =  0,  if 

2  _  cum(z,-  :  2)  _ 
^  St  cum(a,'  :  2)  ~ 


Then 


sf  <  1  V/ 


(21) 


(22) 


(23) 
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Thus 


and 

cum(z,'  :  p)  =  cum(a,-  :  p),  (26) 

for  any  p  >  2. 

Proof:  By  theorem  1,  if  (25)  and  (26)  are  satisfied,  then 

si  =  pf>\-k  (27) 

where  p  is  specified  by  (23).  Substituting  (27)  into  (3)  immediately  yields  (24). 
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On  the  other  hand,  if  (24)  is  satisfied,  then  following  the  same  development  as  in  (14) 

cum(z,-  :  p )  =  ppcum(a,-  :  p)  (28) 

where  by  (23)  f?  —  1.  Q.E.D. 

Recall  [5]  theorem  2.2,  a  sufficient  condition  for  the  output  (recovered)  sequence  to  be  identical 
to  the  input  sequence,  up  to  a  delay  and  a  sign,  is  that  the  probability  distribution  of  the  individual 
Zi  be  equal  the  probability  distribution  of  a,-  According  to  corollary  2.1,  it  is  necessary  and 
sufficient  to  equalize  the  second  order  cumulant  and  any  other  non-zero  cumulant  of  order  p  >  2. 
Furthermore,  if  the  higher  order  cumulant  being  used  is  odd,  the  sign  ambiguity  can  be  resolved. 
Theorem  1  suggests  the  following  family  of  equalization  criteria,  indexed  by  p  and  q: 

Max|cum(z,-  :  p;z*  :  q)\ 

Subject  to  :  cum(z,;z*)  =  cum(a;;a*)  (29) 

We  note  that  the  choice  p  =  q  =  2  yields  the  optimization  criterion  proposed  in  [6],  In  [6]  it 
is  assumed  that  the  input  symbols  are  zero-mean  (i.e.,  JE?{a,}  =  0),  in  which  case  we  obtain  the 
average  power  constraint  2?{|z,j2}  =  _E{|a,j2}. 

In  the  real  case,  theorem  2  suggests  the  following  family  of  criteria,  indexed  by  p: 

Max  {sign  [cum(aj  :  p)]cum(z,-  :  p)} 

Subject  to  :cum(z,-  :  2)  =  cum(a,  :  2)  (30) 

where 

-1  x  <  0 

sign  (a:)  =  (31) 

+  1  x  >  0 

Recall  (6),  if  p  is  even  and  3/  are  real  valued, 

sign  (cum(z,-  :  p))  =  sign  (cum(a,  :  p))  (32) 
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Thus,  for  even  p,  (30)  reduces  to: 


Max|cum(z,-  :  p)| 

Subject  to  :  cum(zt-  :  2)  =  cum(a,  :  2)  (33) 

By  the  theorems,  the  set  of  solutions  of  these  constrained  optimization  criteria  correspond  to 
the  desired  response.  We  now  want  to  make  sure  that  the  criteria  function  do  not  have  spurious 
local  maxima. 

Consider  first  the  real  case.  By  (6),  the  constrained  maximization  in  (30)  is  equivalent  to: 

Max^sf 

l 

Sub ject  to  :  ^  s]  =  1  (34) 


By  the  constraint 


«>*  1-X>? 
\  1*0 


By  the  constraint,  there  is  at  least  one  non- zero  s/.  Thus,  we  suppose,  without  any  loss  of 
generality,  that  0  <  |s0|  <  1.  Substituting  (35)  into  (34)  we  obtain: 


/U)  =  E5f+  l-E5?  (36) 

¥  o  \  <*  o  / 

Clearly,  the  maximization  of  /($)  is  equivalent  to  solving  (34).  We  note  that  /(^)  is  well  defined 
for  S1  =  1  “  5o  <  1»  which  is  the  domain  of  interest.  Differentiating  (36), 

(p  \  p/2— l" 

1-£S?)  j  (37, 

The  roots  of  (37)  are,  for  even  p: 


and  for  odd  p: 


Si  =  0  or  Si  =  ±  I  1  -  Yi  5?  =  ±s0 

\  ¥0  / 


Si  =  0  Or  Si  as  +  1  -  ^  5? 

V  ¥o 
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Thus,  recall  the  constraint  s]  =  1>  the  stationary  points  of  f(s )  for  even  p  are  all  vectors 
s  having  M  non-zero  components  ( M  =  1,2,3,...)  that  are  equal  to  For  odd  p,  the 

stationary  points  of  f(s)  are  all  vectors  s  having  M  non-zero  components  that  are  all  equal  either 
to  -1/VM  or  to  +1  j\fM. 

To  determine  weather  that  stationary  points  are  local  maxima,  local  minima  or  unstable  equi¬ 
libria  (saddle  point),  we  need  to  calculate  the  Hessian,  that  is  the  matrix  partial  derivatives 
d2f{s.)/dsidsj  at  the  stationary  points.  Following  straight  forward  algebraic  manipulations,  the 
Hessian  is  given  by: 


where 


—-(if  _ 


where  a  -  - 1  in  case  the  stationary  point  is  the  vector  $  whose  non-zero  components  are  all  equal 
to  —1/VM,  and  a  =  + 1  in  case  the  stationary  point  is  the  vector  s  whose  non-zero  components 
are  all  equal  to  +1  /VM.  We  note  that  av  =  1  for  even  p.  The  ( M  -  1)  X  ( M  -  1)  block  at  the 
upper  left  corner  of  the  matrix  corresponds  to  the  non-zero  components  of  the  vector  s  (excluding 
So)-  If  M  a=  1,  the  Hessian  reduces  to  a  diagonal  matrix  whose  diagonal  elements  are  all  equal  to 
p  =  -app.  Thus,  for  even  p  all  the  eighenvalues  of  the  matrix  are  negative  indicating  that  all  the 
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vectors  s  having  only  one  non-zero  component  of  magnitude  1  must  be  local  maxima.  By  theorem 
2,  we  already  know  that  these  are  also  the  global  maxima.  For  odd  p ,  all  the  eighenvalues  are 
either  negative  if  a  =  1,  or  positive  if  a  =  —1.  Thus,  for  odd  p,  all  vectors  s  having  one  non-zero 
component  that  is  equal  to  1  are  local  maxima,  by  theorem  2  they  are  global  maxima,  and  all 
vectors  £  having  one  non-zero  component  that  is  equal  to  -1  are  local  minima.  In  fact,  they  are 
global  minima.  This  indicates  once  again  the  ability  to  resolve  sign  ambiguity  in  case  p  is  odd. 

If  M  =  2,  one  eighenvalue  of  the  Hessian  equals  to  2 r?,  and  all  other  eighenvalues  equal  to  p. 
U  M  >  2,  it  can  easily  be  shown  that  one  eighenvalue  equals  to  (M  -  2)t],  ( M  -  2)  eighenvalues 
are  equal  to  2p,  and  all  other  eighenvalues  are  equal  to  p.  Since  p  and  77  have  opposite  signs,  it 
indicates  that  all  other  stationary  points  specified  by  M  =  2,3,...  are  unstable  equilibria  (saddle 
points).  We  can  therefore  state  the  following  lemma: 

Lemma  2.1  (real-case) 

Under  the  assumptions  of  theorem  2,  the  only  local  (hence,  global)  maxima  of  the  constrained 
optimization  in  (34),  or  equivalently  (30),  correspond  to  the  desired  solution  s,  =  p6,~k,  or  equiva¬ 
lently  Z{  -  pax_k,  where  k  is  an  arbitrary  constant  delay,  and  p  is  specified  by  (23). 

The  proof  of  the  lemma  is  self-evident. 

We  now  turn  to  the  complex  case.  If  p  =  q,  then  the  constrained  maximization  in  (29)  is 
equivalent  to: 

Max£>|Jp 

l 

Subject  to  |sf|2  =  1  (38) 

l 

The  optimization  in  (38)  is  identical  to  the  optimization  in  (34),  only  that  instead  of  $i  we  have 
the  variables  |s/|,  and  instead  of  p  we  have  2p.  Thus,  in  complete  analogy,  the  set  of  stationary 
points  of  the  constrained  optimization  in  (38)  are  all  vectors  s  having  M  non-zero  components  of 
magnitude  1/ >/M .  For  M  =  1  we  obtain  the  set  of  local  maxima  which,  by  theorem  2,  are  also  the 
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global  maxima.  All  other  stationary  points  for  M  =  2,3, . . .  axe  unstable  equilibria.  We  therefore 
conclude  the  following: 

Lemma  1.1  (complex  case) 

Under  the  assumptions  of  theorem  1,  the  only  local  (hence,  global)  maxima  of  the  constrained 
optimization  in  (38),  or  equivalently  (29)  with  p  =  q,  correspond  to  the  desired  solution  s,  =  e^Si-k, 
or  equivalently  Z{  =  ej4>a,_fc,  where  k  is  some  fixed  delay  and  <f>  is  an  arbitrary  phase. 

Comment:  The  anlysis  of  the  stationary  points  in  the  complex  case  with  p  q  is  more  compli¬ 
cated  and  will  therefore  not  be  presented  here.  As  already  pointed  out,  in  most  signal  constellations 
used  for  data  communications,  the  probability  distribution  of  the  input  symbols  is  symmetric  under 
rotation,  in  which  case  only  p  =  q  is  of  interest. 
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ALGORITHM  IMPLEMENTATION 


The  constrained  optimizations  in  (34)  and  in  (38)  give  rise  to  the  following  gradient-based 
algorithm: 

nr 

(39) 

LT&i 

1 


/  c^F 

Si  =  St  +  6d7i 


s"  - 


(40) 


vSkp  ' 

where  F  is  the  objective  function  to  be  maximized,  6  >  0  is  the  step-size,  s,  are  the  unit  sample 
response  coefficients  of  the  combined  channel-equalizer  prior  to  the  iteration,  and  s"  are  the  unit 
sample  response  coefficients  after  the  iteration.  The  normalization  operation  in  (40)  is  necessary  to 
satisfy  the  indicated  constraint.  In  the  case  of  (34),  F  =  J2i  sf  >  and  in  the  case  of  (38)  F  =  Yli  |sj|2p. 
Since  in  both  cases  it  has  been  proved  that  F  has  no  spurious  local  maxima,  the  algorithm  is 
expected  to  converge  to  the  desired  solution  regardless  of  initialization. 

A  common  measure  of  equalization  performance  is  the  inter-symbol-interference  (ISI)  defined 


by: 

ISI(i)  =  — f2|2~  **-■-  (41) 

Mmax 

where  |s|max  is  the  component  of  s  having  the  maximum  absolute  value.  Clearly,  ISI  =  0  if  and 
only  if  s  has  only  one  non-zero  component  -  that  is  the  desired  response,  and  small  ISI  indicates 
the  proximity  to  the  desired  solution. 

To  analyze  the  rate  of  convergence  of  the  ISI  using  the  gradient-search  algorithm  in  (39),  (40) 
we  suppose,  without  any  loss  of  generality,  that  so  is  the  component  of  £  having  the  largest  absolute 
value.  Thus, 


ISI(i)  =  £  |s,|!/|>ol!  (42) 

WO 

Now,  if  F  =  J2i  l3/l2p  (complex  case),  then  dF/ds ,•  =  2p|s;|2p“2s,,  in  which  case 


K 1  M  1  +  2p*|s,i2p-2  \ji[ 

ki  K,r  i + 2p%>i2p-2 '  M 


18 


Since  ls;|2  =  1  and  since  |s;|  <  |so|  <  1,  then: 

1  <  1  +  2p6\si\2p~2  <  1  +  2 pS,  i  ?  0 


(44) 


Using  (44)  in  (43), 

*  M  <  K1  -  M 

1  +  2 pS  |«o|  ~  |s£|  V  |a0| 


Therefore,  using  (45)  in  (42), 


(45) 


(i  +  W1S1U)  s  ISI(/)  <  ISI(i)  (46) 

The  right  inequality  asserts  that  the  ISI  monotonically  decreases  from  iteration  to  iteration, 
where  the  left  inequality  sets  an  upper  bound  on  the  rate  of  convergence  of  the  ISI.  It  asserts  that 
the  factor  of  improvement  in  the  ISI  from  iteration  to  iteration  is  bounded  by  1/(1  +  2p<5)2.  Near 
the  point  of  convergence,  |so|  %  1  and  |s,|  «  0  for  t  0,  in  which  case  the  left  inequality  is  tight 
indicating  that  the  rate  of  convergence  is  approximately  1/(1  +  2 p6)2.  It  therefore  suggests  to  use 
a  large  step-size,  where  the  left  inequality  in  (46)  ensures  monotonic  convergence  for  any  choice  of 
6. 

If  we  choose  a  very  large  6,  the  term  <5§£  appearing  on  the  right  hand  side  of  (39)  becomes 
dominant,  and  in  the  limit  we  approach  the  following  algorithm: 


t-E 

•  ~  dsi 


s”  =s 


(47) 

(48) 


i/£iwp ' 

where  we  note  that  6  can  be  factored  out  of  (47)  because  of  the  normalization  operation  in  (48). 
In  this  setting,  the  iterated  s  is  determined  solely  by  the  direction  of  the  gradient.  In  this  case, 


l<l  ..  1*51  H*-1 

Kl  Kl  '  W2”-1 


(49) 
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since 


then 


2p-l 


(50) 


isiu")  =  ixivmi2  =  Eis«i2(2p'1)/isoi2(2p-1) 

t'^O  t'*0 

2p-l 

<  I  £  H2/M2 )  =  [isi(i)]2p_1 


(51) 


indicating  a  very  fast,  at  least  exponential,  convergence  of  the  ISI. 

Similar  results  can  be  obtained  in  the  real  case  where  F  =  Yli  sf  • 

The  algorithms  in  (39),  (40)  and  in  (47),  (48)  are  explicit  since  they  are  presented  in  terms 
of  the  unit  sample  response  coefficients  s,  of  the  combined  system  5.  We  want  to  express  these 
algorithms  in  terms  of  the  tapes  c,  of  the  equalizer  C.  To  do  that,  we  use  the  convolutional  relation 
between  s,  and  ct  : 


&\  —  0  c»  —  )  ' 


(52) 


where  h,  is  the  unit  sample  response  of  the  unknown  system  (channel)  H.  Invoking  the  chain  rule 
for  differentiation, 

flr  _ ar  ar 

(53) 


dF  dF ,  „  _  , .  OF 


dsi 

where  we  note  that  in  the  complex  case  dF/dci  =  8F/dRe(ci)  +  j8F/dIm(ci). 

By  assumption  (tt),  the  Fourier  transform  H(u>)  of  h,  contains  no  zeros  on  the  unit  circle. 
Therefore,  its  inverse  17-1(u)  exists.  We  denote  by  hf1  the  inverse  Fourier  transform  of 
Thus,  h"1  o  hi  =  Si.  Convolving  both  sides  of  (53)  by  (h z})* 


dF  8F 

err^  0  foi 


(54) 


Substituting  (52)  and  (54)  into  (39), 


dF 

hi  o  c\  =  hi  o  c,  +  6  (hzD*  o  — 


(55) 
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convolving  both  sides  of  (55)  by  h~  , 


c;  =  *  +  6  h-1  o  (hZ})‘  o  ^ 


(56) 


The  Fourier  transform  of  hf1  o  (/C*)’  is  1/| H(u)\2.  Thus,  if  H{u)  is  spectrally  white,  that  is: 


|J5T(w)|  =  1  0<w<2tt 


(57) 


then  h,-1  o  (hz})*  =  h,  and  (56)  reduces  to: 


c 


/ 

f 


C, 


(58) 


Also, 


£  M2  =  5  /_(’  is(w)l2<<«>  =  A  £’  |/r(u,)C(u)|2d«, 

=  ^  J  r  =  £  |c,|!  (59) 

Therefore,  normalizing  the  s;  is  the  san,  is  normalizing  the  q: 


(60) 


The  algorithm  specified  by  (58),  (60)  is  identical  to  (39),  (40)  if  H  is  spectrally  white,  implying 
a  spectral  pre-whitening  operation.  Only  in  this  case,  a  gradient  step  in  the  S  domain  i6  equivalent 
to  a  gradient  step  in  the  C  domain. 

Similarly,  under  the  spectral  pre-whitening  operation,  the  algorithm  is  (47)  (48)  is  equivalent 


to: 


(61) 

(62) 
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Next,  we  want  to  develop  an  explicit  expression  for  the  partial  derivatives  dF/dCm.  In  the 
complex  case  with  p  =  q, 

F  =  |cum(*i  :  p;  :  p)\  = 

=  |cum(a,  :  p;  a-  :  p)\  N<|2p  (63) 

l 


Therefore, 


qJ~  =  2p|cum(a{  :  p;a*  :  p)|spl(0P  1 


Substituting  (64)  into  (53), 

8F 


=  2p|cum(a<  :  p;a-  :  p) |  £s?(s*)p  r 


cum  (zi  :  p;  z*  :  p  -  1;  y*_TO)  = 


]C  si, 

\  b 

/p 

Skp-iai-kp-i 

i  ^lkpai-m-kI 

-  sipsk,  •  •  •  •sfc„_, • 

/l  ip  fcl  fcp 

•  cum  ;  •  •  • ;  &i-ip\  ai~kp‘i  •  •  • !  a»-kp_i  I  a«-m*p) 
=  cum(ai  :p;a' :p)^sf(snp_1/i^m 


Combining  (65)  with  (66) 


0F  0_|cum(a,  :p;af  :  p)| _ _ .  x 

~a~  —  2p— — -7-— — — — — — T- cum  (z,-  •  P)  -P  1 1  ili-m) 

ocm  cum(a,  :  p;  a,  :  p) 


=  2p  •  sign[cum(ai  :  p;  a*  :  p)]cum(z<  :  p;  z-  :  p  -  1;  y*_m ) 


In  the  real  case, 


F  =  sign[cum(a,  :  p)]cum(z,  :  p) 
=  |cum(a<  :  p)|  ^  sf 
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Following  the  same  considerations  leading  from  (63)  to  (67), 


a —  =  P  sign[cum(a,  :  p)]cum(r,  :  p  -  1;  y«-m)  (69) 

ocm 

The  computation  of  the  gradient  using  either  (67)  or  (69)  requires  knowledge  of  the  sign  of 
the  corresponding  cumulant  of  a^,  and  this  is  the  only  prior  information  concerning  the  input 
distribution  that  is  required.  We  also  need  to  know  the  joint  cumulant  of  the  input  y,  and  the 
output  Z{  of  the  equalizer.  These  cumulants  can  be  estimated  using  the  available  data. 

To  demonstrate  that,  consider  the  real  case  with  p  =  3.  Assuming  that  a,  is  zero  mean,  y,  and 
2,  are  also  zero  mean,  in  which  case  (69)  reduces  to: 


-  sign[£{a?  }]£{*? y,_m} 


The  expectation  £{zt2y,_m}  may  be  approximated  using  the  current  realization  zfy.-m,  or  it 
may  be  estimated  by  the  cumulative  average  jj  zfy^m,  where  N  corresponds  to  the  number 
of  terms  in  the  sum.  These  approximations  result  recursive/ sequential  algorithms  (the  cumulative 
averaging  can  be  evaluated  sequentially)  in  which  we  perform  one  iteration  per  symbol. 

Alternatively,  using  the  input-output  relation: 

Zi  =  y;  o  q  =  ^  c/y,_i  (71) 


we  obtain: 


~  E  S  Vi- 


E  {yi-iyi-kiVi-h}  (72) 

fcj  fcj 

This  equation  suggests  a  non-recursive  iterative  algorithm  in  which  we  first  estimate  the  expec¬ 
tation  £{yi_/y,_fc,yi_fcj}  by  the  empirical  averaging  jj  yi-/y.-fcy«-it2  over  the  observed  block  of 
data,  and  then  we  iterate  using  the  proposed  algorithms  until  convergence  is  accomplished. 
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As  another  case,  consider  (69)  with  the  choice  p  =  4.  Assuming  the  a*  is  zero  mean,  the 
components  of  the  gradient  (subject  to  the  indicated  constraint)  are: 

d  F 

-r —  =  4  sign[cum(a,  :  4)]E{zfyi-m}  (73) 

ucm 

where  we  note  that  cum(a,  :  4)  =  £{a?}  -  3£{a^}.  The  term  can  similarly  be  ap¬ 

proximated  by  replacing  the  expectation  by  the  current  realization,  or  V  performing  empirical 
averaging. 

Consider  now  the  complex  case  with  p  =  q  =  2.  Once  again  we  assume  that  a;  is  zero  mean. 
For  simplicity,  we  further  assume  that  E{af}  =  0  (e.g.,  the  real  and  complex  components  of  a, 
are  statistically  uncorrelated  with  equal  variance  -  a  condition  that  is  satisfied  for  most  signal 
constellations  used  for  data  communications).  Under  these  assumptions,  (67)  reduces  to: 

dF 

—  =  4  sign[cum(c,-  :  2; a*  :  2)]£{|z,-|2zt-y*_m}  (74) 

This  formula  coincides  with  the  result  developed  in  [6].  The  expectation  in  (74)  can  be  estimated 
by  the  current  realization  (in  which  case  we  obtain  the  stochastic  gradient  algorithm  presented  in 
[6]),  or  by  cumulative  averaging  operation. 
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EFFECT  OF  ADDITIVE  NOISE 


So  far,  we  have  completely  ignored  the  presence  of  additive  noise  in  the  system.  To  study 
this  effect,  let  the  noise  (error)  signal  e,  be  modeled  as  the  output  of  a  linear  time-invariant  filter 
T  =  {t,}  driven  by  a  white  (i.i.d.)  noise  process  as  illustrated  in  Figure  2.  We  suppose  that 
a,-  and  u,-  are  statistically  independent.  Note  that  if  T  =  C  then  v,  represents  the  additive  noise 
generated  at  the  input  to  the  equalizer.  If  T  =  H oC,  the  represents  the  additive  noise  generated 
at  the  channel  input.  The  extension  of  the  model  to  include  several  noise  sources  is  straightforward. 

Invoking  the  statistical  independence  between  a,  and  t>,, 

cum(2,  :  p;  z’  :  q )  =  cum(a,  :  p;  a  •  :  q)  £  sf s’” 

/ 

+  cum(t>j  :  p;  tf  :  q)  ^  tft?  (75) 

l 

Thus,  if  the  cumulants  of  e,  =  u,  o  t,-  are  known,  or  independently  measured,  their  effect  can  be 
removed  by  a  simple  subtraction. 

Perhaps  the  most  interesting  observation  is  that  if  u,  is  Gaussian,  then  cum(t>,  :  p;  u*  :  q)  =  0 
for  p  +  q  >  2.  Therefore,  the  criteria  functions  in  (29)  and  (30)  are  unaffected  by  the  presence  of 
additive  Gaussian  noise.  Substituting  p  =  q  =  1  in  (75), 

cum(z,-;  z“)  =  cum(a,;a*)J^  M2  +  cum(v,q  v* )£  |t/|2  (76) 

l  l 

Thus,  the  additive  Gaussian  noise  only  affect  the  constraint.  Consequently,  it  can  be  verified 
that  the  optimization  criteria  still  converge  to  the  set  of  desired  solutions  up  to  a  constant  gain 
factor. 
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